This file contains the Supplementary Materials for XXX (2025). What
makes a conversation interesting? Linguistic features predictive of
interest in educational conversations between teachers and learners of
English
Linguistic predictors of human interest ratings.
Comprehensibility
This table summarizes the outcome of feature-level model comparisons
between linear regression models containing only a linear effect or both
a linear and a quadratic effect (linear and quadratic effects were
always orthogonal to each other). Separate comparisons were conducted
for each feature and for models with either average interestingness
(Int) or average expected interestingness (Exp Int) as outcome
variables. Model comparison p values < .05 indicate the model
including both a linear and quadratic predictor is better according to a
likelihood ratio test (function anova() in R).
Table S1 - Feature-level model comparisons for comprehensibility metrics
|
Feature
|
Int (p value)
|
Exp Int (p value)
|
Int (winning model)
|
Exp Int (winning model
|
|
gis
|
0.0000
|
0.0027
|
quadratic
|
quadratic
|
|
syllable_count
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
|
lexicon_count
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
|
difficult_words
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
|
flesch_reading_ease
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
|
flesch_kincaid_grade
|
0.0002
|
0.0014
|
quadratic
|
quadratic
|
|
smog_index
|
0.2275
|
0.3825
|
linear
|
linear
|
|
coleman_liau_index
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
|
automated_readability_index
|
0.5579
|
0.5556
|
linear
|
linear
|
|
dale_chall_readability_score
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
|
spache_readability
|
0.1531
|
0.3249
|
linear
|
linear
|
|
gunning_fog
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
|
linsear_write_formula
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
|
mcalpine_eflaw
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
|
text_standard
|
0.0000
|
0.0000
|
quadratic
|
quadratic
|
Uptake
Figure S3 shows effects of teacher_uptake_student (A) and
student_uptake_teacher (B) on average interestingness as a function of
whether the first speaker displayed on a page was the teacher or the
student.
Figure S4 shows effects of teacher_uptake_student (A) and
student_uptake_teacher (B) on average expected interestingness as a
function of whether the first speaker displayed on a page was the
teacher or the student.
Combined models
Tables S2 (Interestingness) and S3 (Expected Interestingness) show the
random effect estimates for the combined models reported in Tables 8 and
9 in the main manuscript, respectively.
Table S2 - Int ~ conc + cli + si + ari + sri + gis_lc + gis_qc + lex_lc
+ lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 |
conversation_id) + (1 | AnnId)
|
grp
|
var1
|
var2
|
vcov
|
sdcor
|
|
AnnId
|
(Intercept)
|
NA
|
0.350
|
0.592
|
|
conversation_id
|
(Intercept)
|
NA
|
0.056
|
0.237
|
|
Residual
|
NA
|
NA
|
0.743
|
0.862
|
Table S3 - Int ~ conc + cli + si + ari + sri + gis_lc + gis_qc + lex_lc
+ lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 |
conversation_id) + (1 | AnnId)
|
grp
|
var1
|
var2
|
vcov
|
sdcor
|
|
AnnId
|
(Intercept)
|
NA
|
0.355
|
0.596
|
|
conversation_id
|
(Intercept)
|
NA
|
0.042
|
0.204
|
|
Residual
|
NA
|
NA
|
0.729
|
0.854
|
Separate models for each feature category, including maximal random
slopes that could be estimated.
These are the full outputs (fixed effects, followed by random
effects) for models combining selected features from each of the three
categories (concreteness, comprehensibility, uptake) separately; these
models include the maximal random slopes that could be estimated without
convergence warnings. The model formula is reported for the
Interstingness model; in all cases we used the same model formula for
the Expected interestingness model
Table S4: model formula for concreteness
|
Feature
|
Formula
|
|
Concreteness
|
Int~conc + (1 | conversation_id) + ((1 | AnnId) + (0 +
conc | AnnId))
|
Table S5: Concreteness predicting Interestingness - fixed effects
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
2.108
|
0.069
|
30.496
|
|
conc
|
-0.164
|
0.013
|
-12.987
|
Table S6: Concreteness predicting Interestingness - random effects
|
grp
|
var1
|
var2
|
vcov
|
sdcor
|
|
AnnId
|
conc
|
NA
|
0.010
|
0.101
|
|
AnnId.1
|
(Intercept)
|
NA
|
0.364
|
0.603
|
|
conversation_id
|
(Intercept)
|
NA
|
0.058
|
0.241
|
|
Residual
|
NA
|
NA
|
0.793
|
0.890
|
Table S7: Concreteness predicting Expected interestingness - fixed
effects
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
1.998
|
0.067
|
29.635
|
|
conc
|
-0.115
|
0.013
|
-8.912
|
Table S8: Concreteness predicting Expeccted interestingness - random
effects
|
grp
|
var1
|
var2
|
vcov
|
sdcor
|
|
AnnId
|
conc
|
NA
|
0.011
|
0.105
|
|
AnnId.1
|
(Intercept)
|
NA
|
0.364
|
0.603
|
|
conversation_id
|
(Intercept)
|
NA
|
0.044
|
0.209
|
|
Residual
|
NA
|
NA
|
0.765
|
0.875
|
Table S9: model formula for comprehensibility
|
Feature
|
Formula
|
|
Comprehensibility
|
Int~cli + si + gis_lc + gis_qc + lex_lc + lex_qc + (1 |
conversation_id) + ((1 | AnnId) + (0 + gis_lc | AnnId)
+ (0 + lex_lc | AnnId) + (0 + lex_qc | AnnId))
|
Table S10: Comprehensibility predicting Interestingness - fixed effects
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
2.088
|
0.067
|
31.288
|
|
cli
|
0.037
|
0.006
|
5.817
|
|
si
|
0.033
|
0.008
|
3.902
|
|
gis_lc
|
0.017
|
0.010
|
1.757
|
|
gis_qc
|
-0.031
|
0.006
|
-4.811
|
|
lex_lc
|
0.207
|
0.019
|
10.714
|
|
lex_qc
|
-0.166
|
0.018
|
-9.468
|
Table S11: Comprehensibility predicting Interestingness - random effects
|
grp
|
var1
|
var2
|
vcov
|
sdcor
|
|
AnnId
|
lex_qc
|
NA
|
0.018
|
0.133
|
|
AnnId.1
|
lex_lc
|
NA
|
0.027
|
0.163
|
|
AnnId.2
|
gis_lc
|
NA
|
0.003
|
0.056
|
|
AnnId.3
|
(Intercept)
|
NA
|
0.345
|
0.588
|
|
conversation_id
|
(Intercept)
|
NA
|
0.050
|
0.223
|
|
Residual
|
NA
|
NA
|
0.720
|
0.849
|
Table S12: Comprehensibility predicting Expected interestingness - fixed
effects
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
1.978
|
0.067
|
29.748
|
|
cli
|
0.028
|
0.006
|
4.514
|
|
si
|
0.032
|
0.008
|
3.763
|
|
gis_lc
|
0.006
|
0.010
|
0.612
|
|
gis_qc
|
-0.015
|
0.006
|
-2.299
|
|
lex_lc
|
0.167
|
0.018
|
9.083
|
|
lex_qc
|
-0.125
|
0.017
|
-7.543
|
Table S13: Comprehensibility predicting Expeccted interestingness -
random effects
|
grp
|
var1
|
var2
|
vcov
|
sdcor
|
|
AnnId
|
lex_qc
|
NA
|
0.015
|
0.123
|
|
AnnId.1
|
lex_lc
|
NA
|
0.023
|
0.153
|
|
AnnId.2
|
gis_lc
|
NA
|
0.003
|
0.056
|
|
AnnId.3
|
(Intercept)
|
NA
|
0.359
|
0.599
|
|
conversation_id
|
(Intercept)
|
NA
|
0.039
|
0.197
|
|
Residual
|
NA
|
NA
|
0.712
|
0.844
|
Table S14: modele formula for uptake
|
Feature
|
Formula
|
|
Uptake
|
Int~LCS_proc_d_numc + suthlc + cos_within_page_c + (1 |
conversation_id) + ((1 | AnnId) + (0 + LCS_proc_d_numc
| AnnId) + (0 + suthlc | AnnId) + (0 +
cos_within_page_c | AnnId))
|
Table S15: Uptake predicting Interestingness - fixed effects
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
2.135
|
0.070
|
30.328
|
|
LCS_proc_d_numc
|
0.115
|
0.010
|
11.211
|
|
suthlc
|
0.040
|
0.009
|
4.559
|
|
cos_within_page_c
|
-0.062
|
0.009
|
-6.745
|
Table S16: Uptake predicting Interestingness - random effects
|
grp
|
var1
|
var2
|
vcov
|
sdcor
|
|
AnnId
|
cos_within_page_c
|
NA
|
0.002
|
0.047
|
|
AnnId.1
|
suthlc
|
NA
|
0.002
|
0.044
|
|
AnnId.2
|
LCS_proc_d_numc
|
NA
|
0.004
|
0.065
|
|
AnnId.3
|
(Intercept)
|
NA
|
0.371
|
0.609
|
|
conversation_id
|
(Intercept)
|
NA
|
0.065
|
0.254
|
|
Residual
|
NA
|
NA
|
0.785
|
0.886
|
Table S17: Uptake predicting Expected interestingness - fixed effects
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
2.019
|
0.068
|
29.611
|
|
LCS_proc_d_numc
|
0.081
|
0.010
|
7.702
|
|
suthlc
|
0.033
|
0.009
|
3.653
|
|
cos_within_page_c
|
-0.046
|
0.010
|
-4.618
|
Table S18: Uptake predicting Expeccted interestingness - random effects
|
grp
|
var1
|
var2
|
vcov
|
sdcor
|
|
AnnId
|
cos_within_page_c
|
NA
|
0.004
|
0.063
|
|
AnnId.1
|
suthlc
|
NA
|
0.003
|
0.051
|
|
AnnId.2
|
LCS_proc_d_numc
|
NA
|
0.005
|
0.070
|
|
AnnId.3
|
(Intercept)
|
NA
|
0.369
|
0.608
|
|
conversation_id
|
(Intercept)
|
NA
|
0.046
|
0.215
|
|
Residual
|
NA
|
NA
|
0.752
|
0.867
|
Linguistic predictors of variance in interest ratings
Figures S5-S14 show the relaion between linguistic features and variance
in Interestingness ratings.
Tables S19 (Interestingness) and S20 (Expected Interestingness) report
models predicting variance in human ratings.
Table S19 - Int_var ~ conc + cli + si + ari + sri + gis_lc + gis_qc +
lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 |
conversation_id) + (1 | project)
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
1.001
|
0.050
|
19.934
|
|
conc
|
-0.016
|
0.015
|
-1.091
|
|
cli
|
-0.013
|
0.019
|
-0.686
|
|
si
|
-0.029
|
0.017
|
-1.742
|
|
ari
|
0.048
|
0.033
|
1.463
|
|
sri
|
-0.020
|
0.027
|
-0.725
|
|
gis_lc
|
0.002
|
0.015
|
0.125
|
|
gis_qc
|
-0.005
|
0.012
|
-0.457
|
|
lex_lc
|
-0.011
|
0.018
|
-0.596
|
|
lex_qc
|
0.029
|
0.013
|
2.207
|
|
LCS_proc_d_numc
|
-0.028
|
0.013
|
-2.088
|
|
suthlc
|
-0.014
|
0.012
|
-1.109
|
|
cos_within_page_c
|
-0.008
|
0.013
|
-0.609
|
Table S20 - ExpInt_var ~ conc + cli + si + ari + sri + gis_lc + gis_qc +
lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 |
conversation_id) + (1 | project)
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
1.037
|
0.056
|
18.462
|
|
conc
|
-0.025
|
0.015
|
-1.688
|
|
cli
|
-0.025
|
0.019
|
-1.322
|
|
si
|
-0.009
|
0.017
|
-0.504
|
|
ari
|
0.046
|
0.033
|
1.381
|
|
sri
|
-0.044
|
0.028
|
-1.581
|
|
gis_lc
|
-0.006
|
0.015
|
-0.427
|
|
gis_qc
|
-0.012
|
0.012
|
-1.006
|
|
lex_lc
|
-0.012
|
0.018
|
-0.694
|
|
lex_qc
|
0.018
|
0.013
|
1.358
|
|
LCS_proc_d_numc
|
-0.008
|
0.013
|
-0.606
|
|
suthlc
|
-0.006
|
0.012
|
-0.509
|
|
cos_within_page_c
|
-0.018
|
0.013
|
-1.355
|
Proficiency
Tables S21 (Interestingness) and S22 (Expected Interestingness) report
models predicting human ratings from features and annotator/student
proficiency.
Table S21 - Int ~ level_match_numc + student_level_nc +
annotator_level_nc + conc + cli + si + ari + sri + gis_lc + gis_qc +
lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 |
AnnId)
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
2.130
|
0.070
|
30.526
|
|
level_match_numc
|
0.176
|
0.028
|
6.363
|
|
student_level_nc
|
0.046
|
0.011
|
4.364
|
|
annotator_level_nc
|
0.096
|
0.091
|
1.059
|
|
conc
|
-0.022
|
0.010
|
-2.075
|
|
cli
|
0.050
|
0.013
|
3.754
|
|
si
|
0.028
|
0.012
|
2.387
|
|
ari
|
-0.038
|
0.023
|
-1.637
|
|
sri
|
0.039
|
0.019
|
2.019
|
|
gis_lc
|
0.017
|
0.010
|
1.699
|
|
gis_qc
|
-0.032
|
0.008
|
-3.795
|
|
lex_lc
|
0.177
|
0.012
|
14.357
|
|
lex_qc
|
-0.104
|
0.009
|
-11.507
|
|
LCS_proc_d_numc
|
0.037
|
0.009
|
3.915
|
|
suthlc
|
0.024
|
0.009
|
2.851
|
|
cos_within_page_c
|
-0.015
|
0.009
|
-1.612
|
Table S22 - ExpInt ~ level_match_numc + student_level_nc +
annotator_level_nc + conc + cli + si + ari + sri + gis_lc + gis_qc +
lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 |
AnnId)
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
1.996
|
0.072
|
27.860
|
|
level_match_numc
|
0.194
|
0.027
|
7.143
|
|
student_level_nc
|
0.055
|
0.010
|
5.228
|
|
annotator_level_nc
|
0.069
|
0.093
|
0.740
|
|
conc
|
-0.006
|
0.010
|
-0.622
|
|
cli
|
0.037
|
0.013
|
2.778
|
|
si
|
0.022
|
0.011
|
1.956
|
|
ari
|
-0.021
|
0.023
|
-0.930
|
|
sri
|
0.013
|
0.019
|
0.695
|
|
gis_lc
|
0.006
|
0.010
|
0.645
|
|
gis_qc
|
-0.013
|
0.008
|
-1.583
|
|
lex_lc
|
0.159
|
0.012
|
13.112
|
|
lex_qc
|
-0.076
|
0.009
|
-8.542
|
|
LCS_proc_d_numc
|
0.019
|
0.009
|
2.021
|
|
suthlc
|
0.017
|
0.008
|
2.014
|
|
cos_within_page_c
|
-0.014
|
0.009
|
-1.535
|
Reward Prediction Error
Table S23: Linear effect of cosine similarity predictins rpe
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
0.118
|
0.026
|
4.511
|
|
cos_proc_pages_c
|
0.014
|
0.009
|
1.573
|
Table S24: Linear and quadratic effect of cosine similarity predictins
rpe
|
Feature
|
Beta
|
SE
|
t
|
|
(Intercept)
|
0.118
|
0.026
|
4.511
|
|
cpp_lc
|
0.014
|
0.009
|
1.572
|
|
cpp_qc
|
-0.002
|
0.009
|
-0.266
|